28 research outputs found
Model Reduction and Neural Networks for Parametric PDEs
We develop a general framework for data-driven approximation of input-output maps between infinite-dimensional spaces. The proposed approach is motivated by the recent successes of neural networks and deep learning, in combination with ideas from model reduction. This combination results in a neural network approximation which, in principle, is defined on infinite-dimensional spaces and, in practice, is robust to the dimension of finite-dimensional approximations of these spaces required for computation. For a class of input-output maps, and suitably chosen probability measures on the inputs, we prove convergence of the proposed approximation methodology. Numerically we demonstrate the effectiveness of the method on a class of parametric elliptic PDE problems, showing convergence and robustness of the approximation scheme with respect to the size of the discretization, and compare our method with existing algorithms from the literature
Analysis Of Momentum Methods
Gradient decent-based optimization methods underpin the parameter training which results in the impressive results now found when testing neural networks. Introducing stochasticity is key to their success in practical problems, and there is some understanding of the role of stochastic gradient decent in this context. Momentum modifications of gradient decent such as Polyak's Heavy Ball method (HB) and Nesterov's method of accelerated gradients (NAG), are widely adopted. In this work, our focus is on understanding the role of momentum in the training of neural networks, concentrating on the common situation in which the momentum contribution is fixed at each step of the algorithm; to expose the ideas simply we work in the deterministic setting. We show that, contrary to popular belief, standard implementations of fixed momentum methods do no more than act to rescale the learning rate. We achieve this by showing that the momentum method converges to a gradient flow, with a momentum-dependent time-rescaling, using the method of modified equations from numerical analysis. Further we show that the momentum method admits an exponentially attractive invariant manifold on which the dynamic reduces to a gradient flow with respect to a modified loss function, equal to the original one plus a small perturbation
Ensemble Kalman Inversion: A Derivative-Free Technique For Machine Learning Tasks
The standard probabilistic perspective on machine learning gives rise to
empirical risk-minimization tasks that are frequently solved by stochastic
gradient descent (SGD) and variants thereof. We present a formulation of these
tasks as classical inverse or filtering problems and, furthermore, we propose
an efficient, gradient-free algorithm for finding a solution to these problems
using ensemble Kalman inversion (EKI). Applications of our approach include
offline and online supervised learning with deep neural networks, as well as
graph-based semi-supervised learning. The essence of the EKI procedure is an
ensemble based approximate gradient descent in which derivatives are replaced
by differences from within the ensemble. We suggest several modifications to
the basic method, derived from empirically successful heuristics developed in
the context of SGD. Numerical results demonstrate wide applicability and
robustness of the proposed algorithm.Comment: 41 pages, 14 figure
Conditional Sampling With Monotone GANs
We present a new approach for sampling conditional probability measures,
enabling consistent uncertainty quantification in supervised learning tasks. We
construct a mapping that transforms a reference measure to the measure of the
output conditioned on new inputs. The mapping is trained via a modification of
generative adversarial networks (GANs), called monotone GANs, that imposes
monotonicity and a block triangular structure. We present theoretical
guarantees for the consistency of our proposed method, as well as numerical
experiments demonstrating the ability of our method to accurately sample
conditional measures in applications ranging from inverse problems to image
in-painting
Ensemble Kalman Inversion: A Derivative-Free Technique For Machine Learning Tasks
The standard probabilistic perspective on machine learning gives rise to empirical risk-minimization tasks that are frequently solved by stochastic gradient descent (SGD) and variants thereof. We present a formulation of these tasks as classical inverse or filtering problems and, furthermore, we propose an efficient, gradient-free algorithm for finding a solution to these problems using ensemble Kalman inversion (EKI). The method is inherently parallelizable and is applicable to problems with non-differentiable loss functions, for which back-propagation is not possible. Applications of our approach include offline and online supervised learning with deep neural networks, as well as graph-based semi-supervised learning. The essence of the EKI procedure is an ensemble based approximate gradient descent in which derivatives are replaced by differences from within the ensemble. We suggest several modifications to the basic method, derived from empirically successful heuristics developed in the context of SGD. Numerical results demonstrate wide applicability and robustness of the proposed algorithm
Model Reduction and Neural Networks for Parametric PDEs
We develop a general framework for data-driven approximation of input-output maps between infinite-dimensional spaces. The proposed approach is motivated by the recent successes of neural networks and deep learning, in combination with ideas from model reduction. This combination results in a neural network approximation which, in principle, is defined on infinite-dimensional spaces and, in practice, is robust to the dimension of finite-dimensional approximations of these spaces required for computation. For a class of input-output maps, and suitably chosen probability measures on the inputs, we prove convergence of the proposed approximation methodology. Numerically we demonstrate the effectiveness of the method on a class of parametric elliptic PDE problems, showing convergence and robustness of the approximation scheme with respect to the size of the discretization, and compare our method with existing algorithms from the literature
Learning Homogenization for Elliptic Operators
Multiscale partial differential equations (PDEs) arise in various
applications, and several schemes have been developed to solve them
efficiently. Homogenization theory is a powerful methodology that eliminates
the small-scale dependence, resulting in simplified equations that are
computationally tractable. In the field of continuum mechanics, homogenization
is crucial for deriving constitutive laws that incorporate microscale physics
in order to formulate balance laws for the macroscopic quantities of interest.
However, obtaining homogenized constitutive laws is often challenging as they
do not in general have an analytic form and can exhibit phenomena not present
on the microscale. In response, data-driven learning of the constitutive law
has been proposed as appropriate for this task. However, a major challenge in
data-driven learning approaches for this problem has remained unexplored: the
impact of discontinuities and corner interfaces in the underlying material.
These discontinuities in the coefficients affect the smoothness of the
solutions of the underlying equations. Given the prevalence of discontinuous
materials in continuum mechanics applications, it is important to address the
challenge of learning in this context; in particular to develop underpinning
theory to establish the reliability of data-driven methods in this scientific
domain. The paper addresses this unexplored challenge by investigating the
learnability of homogenized constitutive laws for elliptic operators in the
presence of such complexities. Approximation theory is presented, and numerical
experiments are performed which validate the theory for the solution operator
defined by the cell-problem arising in homogenization for elliptic PDEs